is claimed is: 



CLAIMS 



A database of unique nucleotide sequences, said database comprising nucleotide 
sequences greater than about 100 nucleotides in length. 

A database of unique nucleotide sequences, said database comprising nucleotide 
sequences between about 100-500 nucleotides in length. 

A database of unique nucleotide sequences, said database comprising nucleotide 
sequences between about 100-1000 nucleotides in length. 

The database of any of claims 1-3, wherein said nucleotide sequence is a 
deoxyribonucleotide sequence. 

The database of any of claims 1-3, wherein said nucleotide sequence is a ribonucleotide 
sequence. 

The database of any of claims 1-3, wherein said sequences are derived from animal DNA 
or RNA. 

The database of claim 6, wherein said animal is a human. 
The database of claim 6, wherein said animal is a mouse. 
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The database of any of claims 1-3, wherein said sequences are derived from plant DNA 
orRNA. 

J 

The database of any of claims 1-3, wherein said plant is a single-cell plant. 

The database of any of claims 1-3, wherein said sequences are derived from fungal DNA 
or RNA. 

The database of any of claims 1-3, wherein said sequences are derived from DNA or 
RNA of a microorganism or virus. 

The database of any of claims 1-3, wherein said sequences are derived from DNA or 
RNA of a single-cell eukaryote. 

The database of any of claims 1-3, wherein said sequences are derived from synthetic 
man-made DNA or RNA. 

The database of any of claims 1-3, wherein said sequences are postulated based upon 
amino acid sequences. 

The database of any of claims 1-3, wherein said database is encoded in a biological 
medium. 

The database of any of claims 1-3, wherein said database is encoded in a written medium. 
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The database of any of claims 1-3, wherein said database is encoded in an electronic 
medium. 

The database of claim 18, wherein said electronic medium is a computer-readable 
medium. 

The database of claim 19, wherein said computer-readable medium is addressable 
through an intemet connection. 

A kit for analyzing nucleotide sequences comprising: 

an electronic medium readable by a computer, said medium encoding a database 
of unique nucleotide sequences, said database comprising nucleotide sequences greater 
than about 100 nucleotides in length. 

A kit for analyzing nucleotide sequences comprising: 

an electronic medium readable by a computer, said medium encoding a database 
of unique nucleotide sequences, said database comprising nucleotide sequences greater 
than about 1 00 nucleotides in length; and, 

instructions for the use of said database. 

A kit for analyzing nucleotide sequences comprising: 

an electronic medium readable by a computer, said medium encoding a database 
of unique nucleotide sequences, said database comprising nucleotide sequences greater 
than about 100 nucleotides in length; 

instructions for the use of said database; and, 



35 



a computer. 



24. An improved database of nucleotide sequences, said database comprising nucleotide 
sequences greater than about 100 nucleotides in length, wherein said improvement 
consists entirely of only unique nucleotide sequences entered into said database only one 
time. 

25. A computer-generated database consisting of only unique nucleotide sequences, said 
database comprising nucleotide sequences greater than about 100 nucleotides in length. 

26. A method for generating a database of sequences that are greater than or equal to about 
100 nucleotides in length, wherein each sequence is entered into the database only one 
time, the method comprising the steps of : 

selecting a query sequence from a redundant database; 
masking said query sequence with known repeat sequences; 
comparing said masked query sequence with identified unique sequences; 
identifying a unique portion of the query sequence that does not have a similar sequence 
in any of the identified unique sequences; and 

adding the unique portion of the query sequence to a unique database. 

27. A database product of the process of claim 26. 

28. The method of claim 26, wherein said sequence is a deoxyribonucleotide sequence. 
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The method of claim 26, wherein said sequence is a ribonucleotide sequence. 

The method of claim 26, wherein said sequences are derived from animal DNA or RNA. 

The method of claim 30, wherein said animal is a human. 

The method of claim 30, wherein said animal is a mouse. 

The method of claim 26, wherein said sequences are derived from plant DNA or RNA. 

The method of claim 33, wherein said plant is a single-cell plant. 

The method of claim 26, wherein said sequences are derived from fungal DNA or RNA. 

The method of claim 26, wherein said sequences are derived from DNA or RNA of a 
microorganism or virus. 

The method of claim 26, wherein said sequences are derived from DNA or RNA of a 
single-cell eukaryote. 

The method of claim 26, wherein said sequences are derived from synthetic man-made 
DNA or RNA. 



The method of claim 26, wherein said sequences are postulated based upon amino acid 
sequences. 

The method of claim 26, wherein said database is encoded in a biological medium. 

The method of claim 26, wherein said database is encoded in a written medium. 

The method of claim 26, wherein said database is encoded in an electronic medium. 

The method of claim 42, wherein said electronic medium is a computer-readable 
medium. 

The method of claim 43, wherein said computer-readable medium is addressable through 
an internet connection. 

The method of claim 26, wherein said redundant database is a Public Domain Database. 
The method of claim 45, wherein said Pubhc Domain Database is GenBank. 
The method of claim 45, wherein said Public Domain Database is dbEST. 
The method of claim 45, wherein said Public Domain Database is TIGR. 
The method of claim 45, wherein said PubHc Domain Database is SwissProt. 
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The method of claim 26, wherein said comparing step further utihzes a Database Search 
Algorithm. 

The method of claim 50, wherein said Database Search Algorithm is BLAST. 

The method of claim 50, wherein said Database Search Algorithm is FASTA. 

The method of claim 50, wherein said Database Search Algorithm is Smith- Waterman. 

The method of claim 26, wherein said comparing step further utilizes a Scoring Matrix 
Program. 

The method of claim 54, wherein said Scoring Matrix Program is PAM. 
The method of claim 54, wherein said Scoring Matrix Program is BLOSUM. 
The process of Figure lA. 
The process of Figure IB. 
The process of Figure 2. 

A method for identifying unique nucleotide sequences, the method comprising the steps 
of: 

selecting a query sequence from a redundant database file; 
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comparing the query sequence with a repeat database file and a unique database file; 

analyzing the results of the comparison of the query sequence with the repeat database 
file and the unique database file to determine if there is one or more nucleotide sequences within 
the repeat database file and the unique database file that match a nucleotide sequence within the 
5 query sequence; and 

identifying any unique nucleotide sequences within the query sequences that do not 
match any nucleotide sequence within the repeat database file and the unique database file. 
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